Automatic transcription of general audio data: effect of environment segmentation on phonetic recognition 1

نویسندگان

  • Michelle S. Spina
  • Victor Zue
چکیده

The task of automatically transcribing general audio data is very different from those usually confronted by current automatic speech recognition systems. The general goal of our work is to determine the optimal training strategy for recognizing such data. Specifically, we have studied the effects of different speaking environments on a phonetic recognition task using data collected from a radio news program. We found that if a singlerecognizer is to be used, it is more effective to use a smaller amount of homogeneous, clean data for training. This approach yielded a decrease in phonetic recognition error rate of over 26% over a system trained with an equivalent amount of data which contained a variety of speaking environments. We found that additional gains can be made with a multiple-recognizer system, trained with environment-specific data. Overall, we found that this approach yielded a decrease in error rate of nearly 2%, with some individual speaking environments’ error rate decreasing by

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Different Approaches to Automatic Speech Segmentation

We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a s...

متن کامل

Automatic Phonetic Transcription of Non − Prompted Speech

Automatic Segmentation" (MAUS) system labels and segments the phonetic constituents of spoken German in a manner similar to highly trained phoneticians. MAUS has been used to train automatic speech recognition (ASR) systems as well as to provide detailed statistical analyses of spontaneous speech (using the Verbmobil I and RVG I corpora). The MAUS system is a reliable, automatic means of testin...

متن کامل

Morphologically Based Automatic Phonetic Transcription

A system is described that automatically generates phonetic transcriptions for German orthographic words. The entire generative process consists of two main steps. In the first step, the system segments the words into their morphs, or prefixes, stems, and suffixes. This segmentation is very important for the transcription of German words, because the pronunciation of the letters depends also on...

متن کامل

Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach

Correct and temporally accurate phonetic segmentation of speech utterances is important in applications ranging from transcription alignment to pronunciation error detection. Automatic speech recognizers used in these tasks provide insufficient temporal alignment accuracy apart from a recognition performance that is sensitive to accent and style variations from the training data. A two-staged a...

متن کامل

The segmentation of multi-channel meeting recordings for automatic speech recognition

One major research challenge in the domain of the analysis of meeting room data is the automatic transcription of what is spoken during meetings, a task which has gained considerable attention within the ASR research community through the NIST rich transcription evaluations conducted over the last three years. One of the major difficulties in carrying out automatic speech recognition (ASR) on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997